Corpora and Cognition: The Semantic Composition of Adjectives and Nouns in the Human Brain

نویسنده

  • Alona Fyshe
چکیده

The action of reading, understanding and combining words to create meaningful phrases comes naturally to most people. Still, the processes that govern semantic composition in the human brain are not well understood. In this thesis, we explore semantics (word meaning) and semantic composition (combining the meaning of multiple words) using two data sources: a large text corpus, and brain recordings of people reading adjective noun phrases. We show that these two very different data sources are both consistent, in that they contain overlapping information, and are complementary, in that they contain non-overlapping, but still congruent information. These disparate data sources can be used together to further the study of semantics and semantic composition as grounded in the brain, or more abstractly as represented in patterns of word usage in corpora. This thesis is supported by three experiments. Firstly, we extend a matrix factorization algorithm to learn an interpretable semantic space that respects the composition of words into noun phrases. We use the interpretability of the model to explore semantic composition as captured in the statistics of word usage in a large text corpus. Secondly, we build a joint model of semantics in corpora and the brain, which fuses brain imaging data with corpus data into one model of semantics. When compared to models that use only a single data source, we find that this joint model excels at a variety of tasks, from matching human judgements of semantics to predicting words from brain activity. Thirdly, we explore semantic composition in the brain through a new brain image dataset, collected with Magnetoencephalography while subjects read adjective-noun phrases. We learn several functions of the brain data that are capable of predicting semantic properties of the adjective, noun, and phrase. From the performance of these functions, we build a theory for the semantic composition of adjective noun phrases in the brain. This thesis asks a fundamentally different question than those asked in previous studies of adjective noun composition: where in the brain and when in time is phrasal meaning located? The answer to this question paints a unique picture of composition in the brain that is congruent with previous findings, but also sheds new light onto the neural processes governing semantic composition. Together, these contributions show that brain imaging data and corpus data can be used in concert to build better models of semantics. These more successful models provide a new understanding semantic composition, both in the brain and in a more abstract sense. Furthermore, this thesis demonstrates how machine learning techniques can be used to analyze and understand complicated data, like the neural activity captured in brain images.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity

Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...

متن کامل

Classification Of Adjectival And Non-Adjectival Nouns Based On Their Semantic Behavior By Using A Self-Organizing Semantic Map

We treat nouns that behave adjectively, which we call adjectival nouns, extracted from large corpora. For example, in “financial world” and “world of finance,” “financial” and “finance” are different parts of speech, but their semantic behaviors are similar to each other. We investigate how adjectival nouns are similar to adjectives and different from non-adjectival nouns by using self-organizi...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Semantic Clustering in Dutch Automatically inducing semantic classes from large-scale corpora

Handcrafting semantic classes is a difficult and time-consuming job, and depends on human interpretation. Possible machine learning techniques would be much faster, and do not rely on interpretation, because they stick to the data. The goal of this research is to present some machine learning techniques that make it possible to achieve an automatic clustering of Dutch words. More particularly, ...

متن کامل

Semantic Clustering of Adjectives and Verbs Based on Syntactic Patterns

In this paper we show that some of the syntactic patterns in an NLP lexicon can be used to identify semantically ”similar” adjectives and verbs. We define semantic similarity on the basis of parameters used in the literature to classify adjectives and verbs semantically. The semantic clusters obtained from the syntactic encodings in the lexicon are evaluated by comparing them with semantic grou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015